瑞士专利CH717594A2 Systems and methods for the detection of behavioral anomalies in applications.

专利PDF首页>>瑞士专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
The invention relates to a system and method for detecting a behavioral anomaly in an application. The method may comprise retrieving (202) identifying (204) at least one key metric from historical usage information for an application on a computing device. The method may comprise generating a regression model (206) configured to predict the usage behavior associated with the application and generating a statistical model (208) configured to identify outliers in the data associated with at least one key metric. The method may comprise receiving real-time usage information for the application (210). The method may comprise predicting, using the regression model, a usage model for the application (212) that indicates the predicted values of at least one key metric. After determining that the usage information does not match the intended use pattern (214) and does not include a known outlier (216), the method may comprise detecting the behavioral anomaly (218) and generating an alarm (220).
公开号:CH717594A2
申请号:CH00321/21
申请日:2021-03-25
公开日:2021-12-30
发明作者:Kulaga Andrey；Protasov Stanislav；Beloussov Serguei
申请人:Acronis Int Gmbh；
IPC主号:

专利说明:

TECHNICAL FIELD
The present disclosure relates to the field of data security and, more specifically, to systems and methods for detecting behavioral anomalies in applications.
STATE OF ART
Stored data and applications are often subject to malicious cyber attacks depending on their storage location and associated network connections. To monitor intrusions and malware, companies use monitoring systems that detect behavioral anomalies. Conventional monitoring systems, however, require the manual definition of alarm thresholds for key application metrics. Setting the alarm thresholds is first and foremost laborious. Alarm thresholds also prevent the detection of anomalous behavior when key metrics fall outside the ranges defined by an administrator. For example, an administrator-set alarm threshold may identify a CPU utilization rate greater than 65% on a given server in the period between 12:00 and 6:00 as abnormal. However, a malicious entity that uses only 64% of the CPU processing would not be detected.
On the other hand, the alarm thresholds can also induce various false positives. For example, an authorized user can actually use the server during the time frame described above such that 70% of the CPU processing is used at one point. Despite being used by an authorized user, a conventional monitoring system would result in a false positive in reporting an anomaly. Depending on how the system is designed to react to a trigger, access by the authorized user can be made difficult or impossible. It is therefore necessary to improve the approach to anomaly detection.
SUMMARY
The disclosure aspects relate to the field of detecting a behavioral anomaly in an application. In one example, a method may comprise retrieving historical usage information for an application on a computing device. The method may include identifying at least one key metric from historical usage information. The method may comprise generating a regression model configured to predict the usage behavior associated with the application based on the data associated with at least one key metric. The method may comprise generating a statistical model configured to identify outliers (outliers) in the data associated with at least one key metric. After generating the regression model and statistical model, the method may include receiving real-time usage information for the application. The method may comprise predicting, using the regression model, a usage model for the application indicating the predicted values of at least one key metric. After determining that the usage information received in real time does not match the intended use pattern, the method may comprise determining through the statistical model whether or not the usage information includes a known outlier. After determining that the usage information does not include the known outlier, the method may include detecting the behavioral anomaly. The method may comprise generating an alarm indicative of the behavioral abnormality.
In one example, historical usage information occurred over a periodic time interval and predicting the usage pattern further includes using a version of the regression model associated with the time interval.
In one example, the statistical model is a probability distribution that highlights data points associated with at least one metric that are not anomalous.
In one example, at least one key metric comprises at least one of (1) client connections, (2) latency, (3) number of account searches, (4) bytes read, and (5) number of file searches.
In one example, after determining that the usage information received in real time matches the intended use pattern, or that the usage information includes a known outlier, the method may include determining that the behavioral anomaly does not occur. is verified and does not generate the alarm.
In one example, determining that the usage information received in real time does not match the intended usage pattern further includes determining that an average difference between the values of at least one key metric from the usage information received in real time and the values of the key metric according to the predicted use pattern exceeds a threshold difference.
In one example, the method may comprise receiving an alarm response indicating that the behavioral abnormality is a false positive and automatically increasing the threshold difference.
In one example, the method may comprise receiving an alarm response indicating that the behavioral abnormality is a false positive and adjusting both the regression model and the statistical model based on the usage information received in real time, in where the regression model is retrained on an updated dataset and the statistical model indicates an updated outlier.
It should be noted that the methods described above can be implemented in a system comprising a hardware processor. Alternatively, the methods can be implemented using computer instructions from a non-transient computer medium.
The simplified summary of the exemplary aspects above serves to enable a basic understanding of the present disclosure. This summary is not an exhaustive summary of all aspects covered and is neither intended to identify key or critical elements of all aspects, nor to outline the scope of some or all aspects of this disclosure. Its sole purpose is to present one or more aspects in simplified form as a prelude to the more detailed description of the disclosure which follows. To complete the foregoing, one or more aspects of the present disclosure includes the features described and exemplified in the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and form part of this specification, illustrate one or more exemplary aspects of the present disclosure and, together with the detailed description, serve to explain the principles and implementations thereof.
[0015] FIG. 1 is a block diagram illustrating a system for detecting behavioral anomalies in applications.
[0016] FIG. 2 is a flowchart illustrating a method for detecting behavioral anomalies in applications.
[0017] FIG. 3 presents an example of a generic computer system on which embodiments of the present disclosure can be implemented.
DETAILED DESCRIPTION
Exemplary aspects in the context of a computer system, method and program for detecting behavioral anomalies in applications are described herein. Those of ordinary skill in the art will realize that the following description is purely illustrative and is not intended to be limiting in any way. Other aspects will immediately reveal themselves to those skilled in the art who will make use of this disclosure. Reference will now be made in detail to the implementations of the exemplary aspects as illustrated in the accompanying drawings. The same reference indicators will be used, as far as possible, in all drawings and in the following description to refer to the same or similar elements.
To address the shortcomings of conventional monitoring systems, an anomaly detection system should require no manual work to train the models, work for any type of device, automatically detect anomalies, fine-tune the definition of anomalies in according to the user's needs, adapt the thresholds so that anomalies can be detected regardless of the system load, be highly accurate and decrease spam-alerts (for example, false positives).
[0020] FIG. 1 is a block diagram illustrating an exemplary system 100 for detecting behavioral anomalies in applications. The Anomaly Detection System (ADS) 102 may be a module of security software such as antivirus software. The ADS 102 can be stored in the storage device 104 to monitor anomalies on the storage device 104, or it can be stored on a different device and communicate with the storage device 104 over a network such as the Internet. The ADS 102 can be composed of several components such as the data retrieval 106, the automatic learning module 108, the statistical module 110 and the safety module 112.
In one example, the data retriever 106 of the ADS 102 can be configured to retrieve historical information of use of an application on a computer device (e.g. the computer system described in FIG. 3). Historical usage information may include details such as when the application was accessed, by whom, device status information (e.g., RAM consumption, CPU usage percentage, storage space, etc.), requests made to the application, requests made by the application, network connections used by the application (e.g. IP addresses), etc. In some examples, historical usage information can represent all information since the application was installed on the computing device. In some examples, historical usage information may cover a certain time frame (for example, June 1, 2020 at 12 noon to June 2, 2020, noon). In some examples, historical usage information may cover a periodic time frame. For example, the data retriever 106 can retrieve the information of every Monday since the application was installed. Based on the specific Monday data, ADS 102 can predict the behavior for a following Monday.
From historical usage information, data retriever 106 can identify at least one key metric. A key metric can be, associated with the application, a number of client connections, command and execution latency, a number of account lookups, an amount of bytes read, and a number of file searches. Note that there may be many other key metrics as well, such as the amount of bytes written, the length of usage time, the functions accessed in the application, etc. Data retriever 106 can analyze historical usage information to identify at least one key metric and generate a data structure with data associated with at least one key metric. For example, if the identified key metric is the amount of bytes read in association with the application, the data structure may include the timestamps and their respective numbers of bytes read. This data structure consists of training data 114 and is used by the ADS 102 to generate the machine learning module 108, which is configured to predict the usage behavior associated with the application based on the data associated with at least one key metric .
In some examples, the machine learning module 108 may be a one-class support vector machine (SVM) used for novelty detection. One-class SVM can be trained using key metrics captured from historical usage information. Since anomalies can be quite rare, most of the data 114 can only indicate the correct use of an application. One-class SVM allows usage data that appears to be different from training data 114 (ie, the correct key usage metrics) to be classified as an anomaly.
In some examples, the machine learning module 108 may be a machine learning algorithm that targets a prediction value based on one or more independent variables. In some examples, if only one key metric is used, machine learning module 108 can use a linear regression model. In other examples, if multiple key metrics are considered in generating module 108, a polynomial regression model or multivariable linear regression model is used. The goal of machine learning module 108 is to learn how the application has historically behaved and then predict how it will behave in the future. For example, if the application has had some latency and a certain number of searches in the last ten Mondays between 12:00 and 12:01, then the application can be expected to have the same latency and searches. on a subsequent Monday. The machine learning module 108 can be trained and tested on the training data 114 using a teaching method such as stochastic gradient descent.
The ADS 102 can also generate the statistical module 110 configured to identify outliers in the data associated with at least one key metric. The statistical module employs a probability distribution that highlights data points associated with at least one key metric that are not anomalous. Suppose the key metric is the number of bytes read. For the time interval considered as Monday between 12:00 and 12:01, the ADS 102 can create a probability distribution indicative of the bytes read. In this probability distribution, the number of bytes read that are least likely to occur (and yet to have occurred) is considered to be the outlier. For example, 95% of the time, the application may have had between 9 million bytes and 10 million bytes read in the given range. However, 5% of the time, the application had more than 15 million bytes read in the given range. In some examples, ADS 102 can set probability thresholds to identify outliers. In this case, any number of bytes that have a probability of 5% or less is considered an outlier.
The machine learning module 108 and the statistical module 110 together provide a way to predict how an application is likely to behave and how the application might behave in rare cases. There may be instances where conventional monitoring systems declare that an activity is a harmful anomaly. However, the activity can be associated with an outlier caused by an authorized user of the computing device.
After generating the machine learning model (e.g., a regression model) and the statistical model, the data retriever 106 of the ADS 102 can receive real-time usage information for the application. The ADS 102 can provide, using the automatic learning module 108, a usage model for the application relating to at least one key metric. The usage pattern indicates an expected set of key metric values at some future time. For example, the machine learning module 108 may be provided with a time slot as input and will produce an expected number of bytes read during the interval. In this case, the output can be a data structure comprising a number of bytes read per second.
The security module 112 may be a module configured to compare the intended use model with a real-time model. For example, the output from machine learning module 108 may consist of 60 data points for a time interval associated with Monday between 12:00 and 12:01 (one for each second). Data points can be an expected number of bytes read. Data retriever 106 can also provide 60 data points for the same time interval. These data points can be the actual number of bytes read.
In some examples, determining that the usage information received in real time does not match the intended usage pattern includes determining that an average difference between the values of at least one key metric from the usage information received in real time and the predicted values of at least one key metric according to the intended use pattern exceeds a threshold difference. For example, the average number of bytes read can be 10 million during the time interval and the average number of bytes read expected can be 5 million. The threshold difference can be an initial value of 2 million. The difference between the mean values is 5 million, higher than the threshold difference. As a result, the safety module 112 can determine that the usage information is a potential anomaly. Note that the mean number can be an average value, a standard deviation value, a median value, etc.
In some examples, the security module 112 can compare each respective data point and determine a percentage error in the prediction. For example, the security module 112 may determine a percentage error between the first data point in the intended use pattern and the first data point of the actual use. For each percentage error, if the percentage error is greater than a threshold percentage error (for example, 20%), the security module 112 can determine that the received data (i.e., usage information) does not match the pattern d 'use. To determine if the usage information is a potential anomaly, the safety module 112 can determine the number of data points with percentage errors greater than the threshold percentage error. If the majority of the data points have percentage errors that exceed the threshold percentage error, the usage information can be identified as a potential anomaly. The transition from "potential anomaly" to "anomaly" is therefore based on a statistical model.
More specifically, the security module 112 determines whether the received usage information includes a known outlier rather than an anomaly via the statistical module 110. For example, a known outlier can show that the application in the past had averaged a little less than 15 million bytes read during a given time interval (while the regression model predicted 10 million at most). If the received usage information also shows 14.5 million bytes read for a short time within the time interval, the received usage information can be considered as an outlier. However, if the usage information shows a much higher number of bytes read (for example, 20 million bytes read), the safety module 112 may determine that this amount of bytes has never been read in the past and therefore this can constitute an anomaly. Thus, after determining that the usage information does not include an outlier, the security module 112 can detect the behavioral anomaly and generate an alarm indicative of the behavioral anomaly. On the other hand, after determining that the usage information received in real time matches the intended use pattern, or that the usage information includes the known outlier, determine that the behavioral anomaly has not occurred and do not generate the alarm.
In some examples, an anomaly can be detected when a window comprising several outliers is detected. For example, after receiving the usage information, the ADS 102 can generate another statistical model of the usage information received (particularly if the expected behavior does not match, to conserve processing power). The new usage information received may indicate that during the time frame the number of bytes read was approximately 14 million 95% of the time and only 7 million 5% of the time. This prolonged percentage for reading more than 14 million bytes can be compared with an abnormal probability threshold (for example 70%). When the percentage is greater than the anomalous probability threshold, it is possible to determine that the number of bytes is an outlier, but the probability of the outlier occurring is high enough to be considered an anomaly.
In some examples, in order to detect a behavioral anomaly when an outlier is detected, the security module 112 determines whether more than one threshold number of foreign key metrics is detected simultaneously. For example, a foreign key metric can be the number of bytes read. Another key foreign metric can be the amount of latency. Suppose out of 5 key metrics tracked, 3 have outliers. After determining that a threshold quantity of key metrics has outliers, the security module 112 can detect a behavioral anomaly. In some examples, the security module 112 may specifically detect a behavioral anomaly if the amount of key metrics that have outliers are independent key metrics. An independent key metric can be a key metric that is not affected by another key metric. For example, the number of bytes read can be related to the number of file searches. Consequently, if outliers are found in both key metrics, only one outlier is considered when comparing to the threshold quantity. Conversely, the number of bytes read may not be related to an application's account number. Consequently, if outliers are found in both key metrics, they can be considered two outliers. The ADS 102 can track all key metrics and each of their independent counterparts in a database.
[0034] Note that the use of the statistical model facilitates the dependence on thresholds by the machine learning module 108. This is because even if the application is not used as historically expected (for example, percentage errors or mean difference are too different from a threshold percentage error or a threshold difference respectively), checking for the presence of outliers minimizes the possibility of false positives / negatives. This is because the usage that can normally be incorrectly classified as an anomaly can be correctly classified as normal use if the usage turns out to be an outlier according to the statistical model.
Furthermore, the thresholds used by modules 108 and 110 can be adjusted based on whether the use classified as an anomaly is actually an anomaly. For example, the safety module 112 may generate an alarm indicating that the use of the application during a given time interval is an anomaly. This alarm can be generated on a user's computing device (for example, a device administrator 104) in the form of email, text, audio output, application notification, etc. After the alarm is generated, the ADS 102 may receive an alarm response indicating that the behavioral abnormality is a false positive. For example, the alarm may require confirmation if the use was by the user or by an unauthorized entity. The user can indicate that the use was authorized. Consequently, the safety module 112 can adjust the thresholds used by modules 108 and 110 (for example, by reducing them). In the example of the threshold difference associated with the average values of the key metrics (both predicted and actual), the safety module 112 can automatically increase the threshold difference.
In some examples, after receiving an alarm response indicating that the behavioral abnormality is a false positive, the security module 112 can also adjust both the regression model and the statistical model based on the usage information received in real time. More specifically, ADS 102 can retrain the regression model using an updated dataset where the collected usage information received in real time is classified as non-anomalous and can regenerate the statistical model, which can identify the usage information as an outlier. .
[0037] FIG. 2 is a flowchart illustrating a method 200 for detecting behavioral anomalies in applications. At 202, data retriever 106 retrieves historical usage information for an application on a computer device. At 204, data retriever 106 identifies at least one key metric from historical usage information. At 206, the anomaly detection system 102 generates a learning module 108 configured to predict the usage behavior associated with the application based on the data associated with at least one key metric. At 208, the anomaly detection system 102 generates a statistical module 110 configured to identify outliers in the data associated with at least one key metric.
At 210, the data retriever 106 receives the real-time usage information for the application. At 212, the anomaly detection system 102 provides, using the automatic learning module 108, a usage model for the application relating to at least one key metric. At 214, the anomaly detection system 102 determines whether the usage information matches the usage pattern. After determining that the usage information does not match the usage pattern, the 216 anomaly detection system 102 determines whether the usage information includes a known outlier. After determining that the usage information does not include a known outlier, at 218 the security module 112 detects a behavioral anomaly and at 220 the security module 112 generates an alarm. If at 214 or 216 the anomaly detection system 102 determines either that the usage information matches the usage pattern or that the usage information includes a known outlier, method 200 reverts to 210 while the anomaly detection system 102 continue to check for behavioral abnormalities.
[0039] FIG. 3 is a block diagram illustrating a computer system 20 on which embodiments of systems and methods for detecting behavioral anomalies in applications can be implemented. The computer system 20 can be in the form of multiple computing devices or in the form of a single computing device, for example a desktop computer, notebook, laptop, mobile computing device, smartphone, tablet, server, computer central, an integrated device and other forms of computing devices.
As shown, the computer system 20 comprises a central processing unit (CPU) 21, a system memory 22 and a system bus 23 which connects the various components of the system, including the memory associated with the processing unit central 21. The system bus 23 can comprise a bus memory or a bus memory controller, a peripheral bus and a local bus capable of interacting with any other bus architecture. Bus examples may include PCI, ISA, PCI-Express, HyperTransport ™, InfiniBand ™, Serial ATA, I <2> C, and other suitable interconnects. The central processing unit 21 (also called processor) can comprise a single or a series of processors with one or more cores. Processor 21 may execute one or more executable computer codes that implement the techniques of the present disclosure. For example, the processor 21 can execute any of the commands / steps of FIGS. 1-2. The system memory 22 can be any memory for storing the data used herein and / or computer programs executable by the processor 21. The system memory 22 can include a volatile memory such as a random access memory (RAM) 25 and a non-volatile memory. volatile such as read-only memory (ROM) 24, flash memory, etc. or a combination of these. The basic input / output system (BIOS) 26 can store the basic procedures for transferring information between elements of the computer system 20, such as those when loading the operating system with the use of ROM 24.
The computer system 20 may comprise one or more storage devices such as one or more removable storage devices 27, one or more non-removable storage devices 28, or a combination thereof. One or more removable storage devices 27 and non-removable storage devices 28 are connected to the system bus 23 via a storage interface 32. In one example, storage devices and corresponding computer storage media are power-independent modules for storage of computer system instructions, data structures, program modules and other data 20. System memory 22, removable storage devices 27, and non-removable storage devices 28 can use a variety of computer storage media. Examples of computer storage media include onboard memory such as cache, SRAM, DRAM, zero capacitor RAM, dual transistor RAM, eDRAM, EDO RAM, DDR RAM, EEPROM, NRAM, RRAM, SONOS, PRAM; flash memory or other memory technologies such as solid state drives (SSDs) or flash drives; storage on magnetic cassettes, magnetic tapes and magnetic disks such as, for example, hard disk drives or floppy disks; optical storage such as, for example, in compact disks (CD-ROMs) or versatile digital discs (DVDs); and any other medium which can be used to store the desired data and which can be accessed by the computer system 20.
The system memory 22, the removable storage devices 27 and the non-removable storage devices 28 of the computer system 20 can be used to store an operating system 35, additional program applications 37, other program modules 38 and data program 39. The computer system 20 may include a peripheral interface 46 for communicating data from input devices 40, such as keyboard, mouse, stylus, game controller, voice command device, touch device or other peripheral device , as a printer or scanner through one or more I / O ports, such as a serial port, parallel port, universal serial bus (USB), or other peripheral interface. A display device 47, such as one or more integrated monitors, projectors or displays, may also be connected to the system bus 23 through an output interface 48, such as a video adapter. In addition to the display devices 47, the computer system 20 can be equipped with other peripheral output devices (not shown), such as speakers and other audiovisual devices.
The computer system 20 may operate in a network environment using a network connection to one or more remote computers 49. The remote computer (s) 49 may consist of local workstations or servers comprising most or all the elements mentioned above in the description of the nature of a computer system 20. Other devices may also be present in the computer network, such as, but not limited to, routers, network stations, peer devices or other network nodes. The computer system 20 may comprise one or more network interfaces 51 or network adapters for communicating with remote computers 49 through one or more networks, such as a local computer network (LAN) 50, a wide-range computer network (WAN), an intranet and the Internet. Examples of network interface 51 may include an Ethernet interface, a Frame Relay interface, a SONET interface, and wireless interfaces.
Aspects of the present disclosure may be a system, method and / or product of a computer program. The computer program product may include computer storage medium (or media) with instructions for computer programs for a processor to perform aspects of this disclosure.
The computer storage medium may be a tangible device capable of storing and storing the program code in the form of instructions or data structures accessible by a processor of a computer device, such as the computer system 20. The storage medium computer may be an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any appropriate combination of these. By way of example, this computer storage medium may comprise a random access memory (RAM), a read only memory (ROM), an EEPROM, a read only memory of a portable compact disc (CD-ROM), a versatile digital disk (DVD), a flash memory, a hard disk, a portable disk, a memory stick, a floppy disk, or even a mechanically encoded device such as punch cards or structures embossed in a groove with recorded instructions. As used herein, a computer storage medium is not intended to be transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or transmission media, or electrical signals transmitted through a wire.
The computer program instructions described herein can be downloaded to the respective computer devices from a computer storage medium or to an external computer or external storage device via a network, e.g. the Internet, a local network, a network wide range and / or a wireless network. The network may include copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, computer gateways and / or edge servers. A network interface of each computer device receives computer program instructions from the network and forwards computer program instructions for storage on a computer storage medium within the respective computer device.
Computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction sets (ISA), machine instructions, machine dependent instructions, microcodes, firmware instructions, status setting data, or source codes or object codes written in any combination of one or more programming languages, including an object-oriented programming language and conventional procedural programming languages. The instructions of computer programs can be executed entirely on the user's computer, partly on the user's computer, as a standalone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter case, the remote computer can be connected to the user's computer through any type of network, including a LAN or WAN, or the connection can be made to an external computer (for example, through the Internet). In some embodiments, electronic circuits that include, for example, programmable logic circuits, field programmable gate arrays (FPGAs), or programmable logic arrays (PLAs) can execute the instructions of computer programs using the status information of the instructions. computer programs to customize electronic circuits, in order to carry out realizations of this disclosure.
In various examples, the systems and methods described in the present disclosure can be treated in terms of modules. The term "module" used here refers to an actual device, component or arrangement of components, implemented via hardware, such as via an application specific integrated circuit (ASIC) or FPGA, or as a combination of hardware and software, such as through a microprocessor system and a set of instructions to implement the functionality of the module, which (while running) transforms the microprocessor system into a special device. A module can also be implemented as a combination of the two, with some functions facilitated only by hardware and other functions facilitated by a combination of hardware and software. In some implementations, at least part, and in some cases all, of a module can run on the processor of a computer system. Consequently, each module can be made in a variety of suitable configurations and need not be limited to a particular implementation exemplified herein.
[0049] For the sake of clarity, not all the routine features of the embodiments are reported here. It would be desirable that in the development of any actual implementation of this disclosure numerous implementation-specific decisions were made in order to achieve the specific goals of the developer, and these specific goals will vary for different implementations and different developers. It is understood that such a development effort could be complex and time-consuming, but it would still be an easy feat for those with ordinary skill in the art who take advantage of this disclosure.
Furthermore, it is understood that the phraseology or terminology used herein is for the purpose of description and not of restriction, so that the terminology or phraseology of this specification is to be interpreted by those skilled in the art in light of the teachings and of the guidelines presented here, in combination with the knowledge of those proficient in the relevant art or arts. Further, no term in the specification or claims shall be given an uncommon or special meaning, unless explicitly stated as such.
The various embodiments illustrated herein include known present and future equivalents of the known modules to which reference is made for illustrative purposes. Further, while the aspects and applications have been shown and described, it is apparent to one skilled in the art and making use of the present disclosure that many more modifications than those mentioned above are possible without departing from the concepts of the invention disclosed herein.

权利要求:
Claims (16)
[1]
1. A method of detecting a behavioral abnormality in an application, the method includes:retrieve historical usage information for an application on a computing device;identify at least one key metric from historical usage information;generate a regression model configured to predict the usage behavior associated with the application based on the data associated with at least one key metric;generate a statistical model configured to identify outliers in data associated with at least one key metric;after generating the regression model and statistical model, receive real-time usage information for the application;predict, using the regression model, a usage model for the application that indicates the expected values of at least one key metric;after determining that the usage information received in real time does not match the intended use pattern, determine using the statistical model whether the usage information includes a known outlier;after determining that the usage information does not include the known outlier, detect the behavioral abnormality; Andgenerate an alarm indicative of the behavioral anomaly.
[2]
The method according to claim 1, wherein the historical usage information occurred over a periodic time interval and wherein predicting the usage pattern further comprises using a version of the regression model associated with the interval of time.
[3]
The method according to claim 1, wherein the statistical model is a probability distribution that highlights the data points associated with at least one metric that are not anomalous.
[4]
The method according to claim 1, wherein the at least one key metric comprises at least one of:(1) client connections,(2) latency,(3) number of account searches,(4) bytes read, e(5) number of file searches.
[5]
The method according to claim 1, further comprising:after determining that the usage information received in real time matches the intended usage pattern, or that the usage information includes the known outlier, determine that the behavioral anomaly has not occurred and do not generate the alarm.
[6]
The method according to claim 1, wherein determining that the usage information received in real time does not match the intended use pattern further comprises:determine that an average difference between the values of at least one key metric from the usage information received in real time and the predicted values of at least one key metric according to the intended use pattern exceeds a threshold difference.
[7]
The method according to claim 6, further comprising:receive an alarm response indicating that the behavioral abnormality is a false positive;automatically increase the threshold difference.
[8]
The method according to claim 1, further comprising:receive an alarm response indicating that the behavioral abnormality is a false positive;adjust both the regression model and the statistical model based on usage information received in real time, where the regression model is retrained on an updated dataset and the statistical model indicates an updated outlier.
[9]
9. A system for detecting a behavioral anomaly in an application, where the system includes:a hardware processor configured to:retrieve historical usage information for an application on a computing device;identify at least one key metric from historical usage information;generate a regression model configured to predict the usage behavior associated with the application based on the data associated with at least one key metric;generate a statistical model configured to identify outliers in data associated with at least one key metric;after generating the regression model and statistical model, receive real-time usage information for the application;predict, using the regression model, a usage model for the application that indicates the expected values of at least one key metric;after determining that the usage information received in real time does not match the intended use pattern, determine using the statistical model whether the usage information includes a known outlier;after determining that the usage information does not include the known outlier, detect the behavioral abnormality; Andgenerate an alarm indicative of the behavioral anomaly.
[10]
The system according to claim 9, wherein the historical usage information has occurred over a periodic time interval and wherein the usage model prediction further includes the use of a version of the regression model associated with the time lapse.
[11]
The system according to claim 9, wherein the statistical model is a probability distribution highlighting data points associated with at least one metric that are not anomalous.
[12]
The system according to claim 9, wherein at least one key metric comprises at least one of:(1) client connections,(2) latency,(3) number of account searches,(4) bytes read, e(5) number of file searches.
[13]
The system according to claim 9, wherein the hardware processor is further configured to:after determining that the usage information received in real time matches the intended use pattern, or that the usage information includes the known outlier, determine that the behavioral anomaly has not occurred and do not generate the alarm.
[14]
The system according to claim 9, wherein the hardware processor is configured to determine that the usage information received in real time does not match the intended usage pattern:determine that an average difference between the values of at least one key metric from the usage information received in real time and the predicted values of at least one key metric according to the intended use pattern exceeds a threshold difference.
[15]
The system according to claim 14, wherein the hardware processor is further configured to:receive an alarm response indicating that the behavioral abnormality is a false positive;automatically increase the threshold difference.
[16]
The system according to claim 9, wherein the hardware processor is further configured to:receive an alarm response indicating that the behavioral abnormality is a false positive;adjust both the regression model and the statistical model based on usage information received in real time, where the regression model is retrained on an updated dataset and the statistical model indicates an updated outlier.

类似技术:

公开号 | 公开日 | 专利标题

TWI510916B|2015-12-01|Storage device lifetime monitoring system and storage device lifetime monitoring method thereof

US10067815B2|2018-09-04|Probabilistic prediction of software failure

CN107317695B|2020-09-08|Method, system and device for debugging networking faults

US10592325B2|2020-03-17|Enabling symptom verification

US10048995B1|2018-08-14|Methods and apparatus for improved fault analysis

US20080244690A1|2008-10-02|Deriving remediations from security compliance rules

US10417084B2|2019-09-17|Method and apparatus for managing device failure

US9794153B2|2017-10-17|Determining a risk level for server health check processing

US9594624B2|2017-03-14|Resolving and preventing computer system failures caused by changes to the installed software

US10585774B2|2020-03-10|Detection of misbehaving components for large scale distributed systems

US9472084B1|2016-10-18|Alarm notification based on detecting anomalies in big data

CN105843699B|2019-06-04|Dynamic random access memory device and method for error monitoring and correction

CH717594A2|2021-12-30|Systems and methods for the detection of behavioral anomalies in applications.

US11138512B2|2021-10-05|Management of building energy systems through quantification of reliability

US20210011793A1|2021-01-14|Determining root-cause of failures based on machine-generated textual data

US11146450B2|2021-10-12|State-based entity behavior analysis

US20160004853A1|2016-01-07|Preventing unauthorized access to computer software applications

US10007583B2|2018-06-26|Generating a data structure to maintain error and connection information on components and use the data structure to determine an error correction operation

US11122084B1|2021-09-14|Automatic monitoring and modeling

US20210256397A1|2021-08-19|Anomaly detection using zonal parameter characteristics and non-linear scoring

JP2020170350A|2020-10-15|Abnormality determination learning appliance, abnormality determination learning program, abnormality determination learning method and abnormality determination system

US20210182164A1|2021-06-17|Systems and methods for providing data recovery recommendations using a.i.

CN107045555B|2020-10-27|System and method for screening and matching battery cells and electronic devices

CN111258845A|2020-06-09|Detection of event storms

Bădică et al.2021|Cascaded Anomaly Detection with Coarse Sampling in Distributed Systems

同族专利:

公开号 | 公开日

JP2022008007A|2022-01-13|

EP3929782A1|2021-12-29|

EP3929782A4|2021-12-29|

US20210406109A1|2021-12-30|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US20150081882A1|2013-09-17|2015-03-19|Stackdriver, Inc.|System and method of alerting on ephemeral resources from an iaas provider|

US9870294B2|2014-01-23|2018-01-16|Microsoft Technology Licensing, Llc|Visualization of behavior clustering of computer applications|

CN104809051B|2014-01-28|2017-11-14|国际商业机器公司|Method and apparatus for predicting exception and failure in computer application|

US10459827B1|2016-03-22|2019-10-29|Electronic Arts Inc.|Machine-learning based anomaly detection for heterogenous data sources|

US20170364818A1|2016-06-17|2017-12-21|Business Objects Software Ltd.|Automatic condition monitoring and anomaly detection for predictive maintenance|

WO2019046996A1|2017-09-05|2019-03-14|Alibaba Group Holding Limited|Java software latency anomaly detection|

EP3673375A1|2017-10-13|2020-07-01|Huawei Technologies Co., Ltd.|System and method for cloud-device collaborative real-time user usage and performance abnormality detection|

US10635519B1|2017-11-30|2020-04-28|Uptake Technologies, Inc.|Systems and methods for detecting and remedying software anomalies|

US11030070B2|2018-06-06|2021-06-08|Vmware, Inc.|Application health monitoring based on historical application health data and application logs|

FR3082962B1|2018-06-26|2020-07-31|Bull Sas|AUTOMATIC AND SELF-OPTIMIZED DETERMINATION OF THE EXECUTION PARAMETERS OF A SOFTWARE APPLICATION ON AN INFORMATION PROCESSING PLATFORM|

WO2020167928A1|2019-02-13|2020-08-20|Obsidian Security, Inc.|Systems and methods for detecting security incidents across cloud-based application services|

US10893064B2|2019-04-24|2021-01-12|Microsoft Technology Licensing, Llc|Identifying service issues by analyzing anomalies|

法律状态:

优先权:

申请号 | 申请日 | 专利标题

US202063044587P| true| 2020-06-26|2020-06-26|

US17/180,912|US20210406109A1|2020-06-26|2021-02-22|Systems and methods for detecting behavioral anomalies in applications|

[返回顶部]